Theoretical Analysis of Geographic Routing in Social Networks 


Ravi Kumar* David Liben-Nowell' Jasmine Novak* Prabhakar Raghavan? 
Andrew Tomkins* 


Abstract 


We introduce a formal model for geographic social networks, and introduce the notion of 
rank-based friendship, in which the probability that a person v is a friend of a person w is 
inversely proportional to the number of people w who live closer to u than v does. We then 
prove our main theorem, showing that rank-based friendship is a sufficient explanation of the 
navigability of any geographic social network that adheres to it. 


1 A Model of Population Networks 


There are two key features that we wish to incorporate into our social-network model: geography 
and population density. We will first describe a very general abstract model; in later sections we 
examine a concrete grid-based instantiation of it. 


Definition 1.1 (Population network) A population network is a 5-tuple (L,d, P,loc, F) where 
e L is a finite set of locations (¢,s,t,x,y,z,...); 


e d: Lx L —R? is an arbitrary distance function on the locations; 


P is a finite ordered set of people (u,v,w,...); 


e loc: P —> L is the location function, which maps people to the location in which they live; 
and 


e EC Px P is the set of friendships between people in the network. 


The ordering on people is required only to break ties when comparing distances between two people. 
Let A(L) denote the power set of L. 

Let pop : L —+ Zt denote the population of each point on L, where pop(¢) := |{u € P : 
loc(u) = ¢}|. We overload notation, and let pop : A(L) —> Z* denote the population of a subset 
of the locations, so that pop(L’) := }°,<,, pop(¢). We will write n := pop(L) = |P| to denote the 
total population, and m := |L| to denote the total number of locations in the network. 
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Let density : L —> [0,1] be a probability distribution denoting the population density of each 
location @ € L, so that density(@) := pop(¢)/n. We similarly extend density : A(L) —> [0,1] so 
that density(L’) = )7,.;, density(¢). Thus density(L) = 1. 

We extend the distance function to accept both locations and people in its arguments, so that we 
have the function d: (PUL) x (PUL) —> R? where d(u, -) := d(loc(u),-) and d(-,v) := d(-, loc(v)) 
for all people u,v € P. 

When comparing the distances between people, we will use the ordering on people to break ties. 
For people u,v, vu’ € P, we will write d(u, v) < d(u, v’) as shorthand for (d(u, v),v) ~tex (d(u, v’), v’). 
This tie-breaking role is the only purpose of the ordering on people. 


2 Rank-Based Friendship 


Following the navigable-small-world model of Kleinberg [1, 2], each person in the network will be 
endowed with one long-range link. We diverge from the model of Kleinberg in the definition of our 
long-range links. Instead of distance, the fundamental quantity upon which we base our model of 
long-range links is rank: 


Definition 2.1 (Rank) For two people u,v € P, the rank of v with respect to u is defined as 
rank,(v) := |{w € P: d(u,w) < d(u,v)}}. 


Note that by breaking ties in distance consistently according to the ordering on P, for any 7 € 
{1,...,} and any person u € P, there is exactly one person v such that rank,(v) = i. We now 
define a model for generating a rank-based social network using this notion: 


Definition 2.2 (Rank-based friendship) For each person u in the network, we generate one 
long-range link from u, where 


Prlu links to v]| « ————. 
| rank,(v) 

Intuitively, one justification for rank-based friendship is the following: person v will have to compete 

with all of the more “convenient” candidate friends for person u, i.e., all people w who live closer 

to u than v does. Note that, for any person u, we have >, 1/rank,(v) = )7}, 1/i = Hn, the nth 

harmonic number. Therefore, by normalizing, we have that 


1 


Hy, + rank,(v) oy) 


Pr[w links to v] = 

Under rank-based friendship, the probability of a link from u to v depends only on the number 
of people within distance d(u,v) of u, and not on the geographic distance itself. 

One important feature of this model is that it is independent of the dimensionality of the space 
in which people live. For example, in the k-dimensional grid with uniform population density and 
the L, distance on locations, we have that |{w : d(u,w) < 5}| « 6*, so the probability that person 
u links to person v is proportional to d(u,v)~*. That is, the rank of a person v with respect to a 
person u satisfies rank,,(v) © d(u,v)*. Thus, although this model has been defined without explicitly 
embedding the locations in a metric space, our rank-based formulation gives essentially the same 
long-distance link probabilities as Kleinberg’s model for a uniform-population k-dimensional mesh. 


3  Rank-Based Friendship on Meshes 


In the following, our interest will lie in population networks that are formed from meshes with 
arbitrary population densities. Let L := {1,...,q}* denote the points on the k-dimensional mesh, 
with length q on each side. We write x = (x,...,2~) for a location x € L. We will consider 
Manhattan distance (1 distance) on the mesh, so that d((x1,..., 2%), (Y1,---, Yk)) = Seg \x;—yi|- 
The only restriction that we impose on the people in the network is that pop(@) > 0 for every @ € L— 
that is, there are no ghost towns with zero population. This assumption will allow us to avoid the 
issue of disconnected sets in what follows. 

Thus a mesh population network is fully specified by the dimensionality k, the side length gq, 
the population P (with an ordering to break ties in interpersonal distances), the friendship set EF, 
and the location function loc : P —> {1,...,q}*, where for every location ¢ € {1,...,q}*, there 
exists a person ue € P such that loc(ug) = 2. 

Following the Kleinberg’s model of navigable small worlds, we include local links in E for each 
person in the network. For now, we assume that each person u at location @“) = loc(u) in the 
network has a local link to some person at the mesh point in each cardinal direction from 0, ice., 
to some person at each of the 2k points (eo, ae 0) 0) + 1,2, ae 0), for any coordinate 
i€ {1,...,k}. Thus, for any two people u and v, there exists a path of length at most qk between 
them, and, more specifically, the geographically greedy algorithm will find a path of length no 
longer than qk. 

In a rank-based mesh population network, we add one long-range link to EF per person in P, 
where that link is chosen probabilistically by rank, according to (1). 


4 The Two-Dimensional Grid 


For concreteness, we focus on the two-dimensional grid, where we have L := {1,...,q} x {1,...,q}, 
and thus m = |L| = q?. We may think of the two-dimensional grid as representing the intersection 
of integral lines of longitude and latitude, for example. 

In this section, we will show that the geographically greedy algorithm on the two-dimensional 
grid produces paths that are on average very short—more precisely, that the expected length of 
the path found by the geographically greedy algorithm is bounded by O(log? n) when the target is 
chosen randomly from the population P. Formally, the geographically greedy algorithm GeoGreedy 
proceeds as follows: given a target t and a current message-holder u, person u examines her set of 
friends, and forwards the message to the friend v of u who is geographically closest to the target t. 
First, we need a few definitions: 


Definition 4.1 (L)-Ball) For any location x € L and for any radius r > 0, let 
B,(«) = {ye L:d(z,y) Sr} = {ye L: |i — | + |z2 — yo| <r} 
denote the L1-ball of radius r centered at location x. 


We consider an exponentially growing set Z := {2': 7 € {1,2,4,...,128[logq]}} of ball radii, and 
we place a series of increasingly fine-grained collections of balls that cover the grid: 


Definition 4.2 (Covering Radius-r Ball Centers) For any radius r € &, let the set 


6, = {2 © L: 221 mod r = 2z2 modr = 0} 


be the set of locations z such that z;/r is half-integral for i € {1,2}. 


For each radius r € &, we will consider the set of radius-r balls centered at each of the locations 
in @,. We begin with a few simple facts about these Lj-balls: 


Fact 4.3 (Only a small number of balls in @,. overlap) For each each radius r € &: 
1. For each location x € L, we have that |{z € @,: d(z,x) <r}| < 25. 


2. For each location z € G,, we have that |{z' € €,/2 : Byj2(z’) M Bp(z) 4 O}| < 169. 


Proof. For the first claim, note that if |z; — 21| > r or if |zg — xq| > r, then d(z,x) > r, and z is 
not an element of the set of relevance. Thus every z € ©, such that d(z,x) <r must fall into the 
range (xj +r,r%2+r). There are at most five half-integral values of z/r that can fall into the range 
[b,b + 2r] for any b, so there are at most twenty-five total points z € @, that satisfy d(a,z) <r. 
For the second claim, notice that any ball of radius r/2 that has a nonempty intersection with 
B,(z) must have its center at a point 2’ such that d(z,z’) < 3r/2. Thus the only 2’ € @,/2 that 
could be in the set of relevance must have z} € [z;—3r/2, z;+3r/2] for i € {1,2} and have 22} /(r/2) 
be half-integral. As in the first claim, the number of half-integral values of 22’/r that can fall into 
the range [b,b + 3r] is at most thirteen for any b. Thus there can be at most 169 total points 
z' € 6/2 so that B,/2(2’) 0 B,(z) #0. 


Fact 4.4 (Relation between balls centered in L and in @,) For each location x € L and for 
each radius r € &: 


1. There exists a location z € 6, such that B,/2(x) C B,(z). 
= 


2. There exists a location z' € 6/2 such that B,/2(2') C B,(«) and x € Byj2(2"). 
Proof. For the first claim, let z € @, be the closest point to x in G,. Note that 71 € [z1—r/4, z1+1/4]; 
otherwise x would be strictly closer to either (z, —r/2, z2) € G, or (z+ 17/2, 22) € ,. Similarly we 
have x2 € [z2—1/4,22+1r/4]. Therefore we have d(x, z) = )lier19} [ti — i] S 7/2. Let y € B,o(z) 
be arbitrary. Then by the triangle inequality we have d(z,y) < d(z,x) + d(a,y) < r/2+r/2=r. 
Thus we have y € B,(z), which proves the claim. 

For the second claim, let z’ € @, /2 be the closest point to x in G2. By the same argument as 
above, we have d(x, z’) < r/4. Immediately we have x € B,/9(2’). Let y € B,/2(2’) be arbitrary. 
Then d(z,y) < d(a, 2’) + d(z’,y) <r/4+r/2 <r, and y € B,(x), which proves the claim. 


Let x and y be two arbitrary locations in L. In what follows, we will use the size of the smallest 
ball in Ue g{Br(z) : z © @} that includes both x and y as a ceiling-like proxy for d(,y), and as 
the measure of progress towards the target. We will also need a large ball from {B,(z) : z € @,} 
that includes both x and y and also includes a large ball centered at y. 


Definition 4.5 (Minimum enclosing-ball radius) For any two locations x,y € L, let mebr(zx, y) 
(“minimum enclosing-ball radius”) denote the minimum r € & such that, for some z € G,, we have 
x,y € B,(z). 


Fact 4.6 (Relating distance and minimum enclosing-ball radius) For any x,y € L, letr := 
mebr(x,y). Then we have 2r > d(a,y) > 1/4. 


Proof. Let z € @, be such that x,y € B,(z), and note that, by definition, there is no 2’ € Cr/2 
such that x,y € B,(z’). The first direction is easy: by the triangle inequality, we have that 
d(x,y) < d(x,z)+ d(z,y) <r+r = 2r. For the other direction, suppose for a contradiction that 
d(x,y) <7r/4. Let z* € @,/2 be such that B,/4(x) C B,,/2(z*), as guaranteed by Fact 1. But then 
we have x,y € B,/4(x) because d(z,y) < r/4, which implies that x,y € B,/9(2*), which in turn 
contradicts the minimality of r. 


Thus, in the path from any source s € L to any target t € L found by GeoGreedy, the path will 
always remain inside the ball Bas 4)(t) © Bo.mebr(s,t)(t)- 


Definition 4.7 (Sixteenfold enclosing ball) Let x,y € L be an arbitrary pair of locations, and 
let r = mebr(x,y). Let sebc(y,r) (“sixteenfold enclosing ball center”) denote the location z),. © Gi6r 
such that Bgr(y) © Bier (zy,,) whose existence is guaranteed by Fact 1. 


Lemma 4.8 (Relationship between ball population and rank) Let s,t € L be an arbitrary 
source/target pair of locations. Let r = mebr(s,t), and let z* = sebc(t,r). Let x,y € L be arbitrary 
locations such that x € Bo,(t) and y € B,/g(t), and let u,v € P be arbitrary people such that 
loc(u) = x and loc(v) = y. Then rank,(v) < pop(Bi6,(z*)). 


Proof. First, we note 


d(x,y) < d(ax,t)+d(t,y) triangle inequality 
< 2r+r/8 assumptions that x € Bo,(t) and y € B,/g(t) 
=5 rye: (2) 


We now claim the following: 
for any location @ € L, if d(x, 0) < d(x, y), then d(z*, 2) < 16r. (3) 


To prove (3), let @ be an arbitrary location so that d(x, @) < d(z,y). Then we have 


d(t,f) < d(t,y)+d(y,x)+d(z,2) triangle inequality 
< d(t,y)+d(y,x)+d(x,y) assumption that d(«, 0) < d(x, y) 
< r/8+d(y,x)+d(z, y) assumption that y € B,/s(t) 
< r/8+17r/8+17r/8 (2) 
= Obr/8: 


Then, we have that ¢ € Bg5,/g(t) C Bg,-(t) © Bier(z*) by the definition of z* = sebc(t,r), which 
proves (3). Now, by definition of rank, we have that 
rank,(v) < |{we P: d(u,w) < d(u,v)}| 
eee pop(?) 
= pop({é EL: d(x, £) s d(x, y)}) 
< pop({@e L: d(é,z*) < 16r}) 
= pop(Bier(z*)) 


where the second inequality follows from (3). 


We are now ready to prove the main technical result of this section, namely that the geographically 
greedy algorithm will halve the distance from the source to the target in a polylogarithmic expected 
number of steps, for a randomly chosen target person. 


Lemma 4.9 (GeoGreedy halves distance in expected polylogarithmic steps) Let s € L be 
an arbitrary source location, and let t € L be a randomly chosen target location, according to the 
distribution density(-). Then the expected number of steps before the geographically greedy algorithm 
started from location s reaches a point in Bas t)j2(t) is O(lognlogm) = O(log? n), where the 
expectation is taken over the random choice of t. 


Proof. Let rz := mebr(s,t), and let z := sebc(t, 7+) so that 
Zt € Ger, and Bg,,(t) C Byer, (2). (4) 
Let z; be the location whose existence is guaranteed by Fact 2 such that 
2% €G,s16 and By, /16(2%) C B,,/g(t) and t € By, /16(2)- (5) 
Putting together (4) and (5), we have the following two facts: 
By sr6(2) © By,g(t) © Bar, (t) © Bier, (2t) (6) 
t © By /16(2t)- (7) 


By Fact 4.6, we know that d(s,t)/2 > r;/8. Thus it will suffice to show that the expected number 
of steps before GeoGreedy started from location s lands in B,, g(t) C Bas,z)/2(t) is O(log n log m). 

Suppose that we start GeoGreedy at the source s, and the current point on the path found by 
the algorithm is some person u € P, at location x, = loc(u). By definition, every step of GeoGreedy 
decreases the distance from the current location to the target t, so we have that 


dust) Sd s,¢) <2 (8) 


We refer to a person u as good if there exists a long-range link from that person to any person 
living in the ball B,.,;g(t). Let au, denote the probability that a person u € P living at location 
Ly = loc(u) € L is good. Then 


1 1 op(B,., g(t 
Qu,t = Ss" rank (v) AL = S> pop(B E ) ai = es a ze 
viloc(v)€B,, /g(t) 7 e viloc(v) €B,., /8(t) 16rz\+t n 16rz\4t n 


by the definition of good, by Lemma 4.8 (which applies by (8)), and by the definition of pop(-). 
Noting that the lower bound on a,,4 is independent of u, we write 


— pop(B,,/s(t)) g 
= S Qut- 
pop( Bier, (zz) - Hn 


Thus the probability that u is good is at least a; for every person u along the GeoGreedy path. 
Furthermore, each step of the algorithm brings us to a new node never seen before by the algorithm 
because the distance to t is strictly decreasing until we reach node t. Thus the probability of finding 
a good long-range link is independent at each step of the algorithm until it terminates. Therefore, 
the expected number of steps before we reach a good person (or t itself) is at most 1/a:. 


Qt 


We now examine the expected value of 1/a; for a target location t € L chosen according to the 


distibution density(-): 


Ere L-~density(-) [1 / a] 


IA 


S- density (t 
t 


)= 

— pop(Br6r,(z)) + Hn 

pop(B,.,/3(t)) 
density(By6,, (Zt)) 


density(B,.,/g(t)) 
_ density( Bier, (zt) 
density(B,.,/16(z4)) 


The equalities follow the definition of expectation, the definition of ay, and from the fact that 
density(-) = pop(-)/n. The inequality follows from the definition of z; in (5), using the fact that 
B,,/16(24) © B,,/3(t) and the monotonicity of density(-). 

We now reindex the summation to be over radii and ball centers rather than over targets t. 
Recall that z € @isr, and 2; € ,,/16, and that B,,/16(2;) © Bisr,(z) by (6), and therefore that 
zt © Byer, (24). Thus, we have that 


IA 


Ere L~density(-) [1/az] 


IA 


IA 


; density( Bier, (zt)) 
Ay: density(t) - : . 
d. () density(B,., /16(z)) 
density( Bi6,(z)) 
He. S- > Ss" ; Ss" density(t) 
TER ze 6r z' Cr 16:2" Bier (z) density(B,16(z )) tizi=e! 
density( Bi6,(z)) 
ye Ss" by : ; os density(t) 
TER 2€6 6r 2'€6/16:2' € Bi 6r (Z) sonst elie? )) te By 16 (2") 


where the last inequality follows from (7). But then 


IA 


Exe L~density(-) [1/az] 


Now we are almost done: 


density( Bie, (z 


(z)) 
An: density(¢ 
2D Do, 1, ME EO) 
TER 2€6 6r 2'€6/16:2' € Bier (2) teB,/16(2’) 
density( Big, (z)) 
ie os : ~~ . density(B,/16(2’)) 
rER z€€ 6p 2! En/16:2'€B 6r(z) density(B,./16(z )) 
Fp, « S> S> S> density(Bi¢r(z)) 
TER 26 6r 2' EG, /16:2' € Bi 6r(z) 
Hy S> S> density(Bier(z)) - |{z’ € G16: 2’ € Bier (z)} I. 
TrE& z€C6r 


by applying Fact 2 a constant number of times, we have that 


\{z’ € Cr/16 * z' € Byg,(z)}| = O(1). 


(9) 


Furthermore, we have }7,-y,,, density(Bi6,(z)) < 25: by Fact 1, there are at most twenty-five balls 
in @;, that include any particular location, so we are simply summing a probability distribution 


with some “double counting,” but counting each point at most twenty-five times. Thus we have 


Eyezndensity()[1/a] < Hn: S > S> density(Bisr(z)) - [{2’ € Gig: 2’ € Basr(2)}| 


rE& ze \6r 
< #001): S° Ne density( B16,(z)) 
TER z€6C or 
= Hy -O(1)- 5° 25 
TCR 


= H,,-O(1)-25-|&| = H,- O(log g) = O(log nlogm). 


because |Z| = O(log g) = O(log m). 


In the case of uniform population density, the value of ay, 4 = 2(1/ log) is independent of s and ¢, 
and the greedy algorithm finds an s-t path of length O(log? n) with high probability [1, 2]. 


Theorem 4.10 (GeoGreedy finds short paths in all 2-D meshes) For any 2-dimensional mesh 
population network with n people and m locations, the expected length of the search path found 
by GeoGreedy from an arbitrarily chosen source location s and a uniformly chosen target t is 
O(log nlog? m) = O(log? n). 


Proof. Immediate by inductive application of Lemma 4.8: the expected number of hops required 
before moving to a node s’ with d(s’,t) < d(s,t)/2 or t itself is O(log nlog m); by repeating this 
process O(log(max,,d(s,t))) = O(log gk) = O(log q*) = O(logm) times, we must arrive at the 
target node t itself. 
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